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Summary 

The  signed  square  root  statistic  R  typically  has  cumulants  on  the  form 
cump(iE)  =  S2,p  +  n-2kp  +  0{n~^).  This  paper  shows  how  to  compute 
kp  without  invoking  the  Bartlett  identities.  As  an  application,  we  show 
how  the  family  of  alternatives  influences  the  coverage  accuracy  of  R,  and  in 
particular  that  a  bad  choice  of  family  can  lead  to  arbitrary  undercoverage 
for  confidence  intervals  based  on  R. 
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1.  Introduction 

The  Bartlett  identities  are  one  of  the  most  powerful  tools  available  in  likelihood  theory.  The 
general  set  of  identities  go  back  to  Bartlett  (1953  a,  b),  though  earlier  versions  exist  in  the  form 
of  the  so-called  Wald  identities.  The  first  two  identites  were  crucial  in  the  early  development  of 
likelihood  theory  -  Fisher,  Neyman  and  Pearson,  Cramer  and  Rao.  Generalizations  of  the  identites 
can  be  found  on  Skovgaard  (1986),  McCuUagh  (1987),  and  Mykland  (1994, 1995b),  among  others. 

Since  their  publication,  these  identities  have  been  used  for  a  variety  of  purposes,  most  notably 
the  analysis  of  the  higher  order  asymptotic  behaviour  of  the  likelihood  ratio  statistic  and  its  square 
root  R.  This  literature  starts  with  Lawley  (1956);  contemporary  research  includes  McCuUagh 
(1984,  1987),  McCuUagh  and  Tibshirani  (1990),  DiCiccio  and  Romano  (1989),  DiCiccio,  HaU  and 
Romano  (1991)  and  Andrews  and  Stafford  (1993),  to  mention  some.  (There  is  also  the  paraUel 
research  done  with  saddlepoint  type  methods,  as  in  Barndorff-Nielsen  and  Cox  (1979,  1984)  and 
Barndorff-Nielsen  (1983,  1986,  1991),  and  the  papers  by  Barndorff-Nielsen  and  Wood  (1995), 
Jensen  (1992, 1995,  1997)  and  Skovgaard  (1990,  1996)  discussed  below).  In  addition  to  aiding  the 
computation  of  coefficients  in  expansions,  the  identities  can  also  be  used  to  estabUsh  asymptotic 
normaUty  and  the  existence  of  asymptotic  expansions  (as  in  the  case  of  martingales  in  Mykland 
(1994,  1995b)). 

There  is,  however,  also  a  dark  side  to  these  identities.  Computations  are  often  long  and 
tedious,  and  the  answers  can  be  hard  to  verify.  This  can  to  some  extend  be  remedied  with  sym- 
boUc  computation,  but  writing  such  programs  is  also  no  simple  task.  We  beUeve  that  this  is  the 
general  experience  of  research  workers  in  this  area,  and  we  certainly  speak  from  painful  personal 
experience,  having  spent  a  month  in  the  Spring  of  1995  to  show  that  cum5(jR)  =  0{n~^)  using 
Taylor  expansions  and  Bartlett  identities  (to  laugh  and/or  cry  with  the  author,  have  a  look  at 
the  (unpubUshed)  technical  report  no.  411  of  the  Department  of  Statistics  at  the  University  of 
Chicago). 

As  demonstrated  in  Mykland  (1996),  however,  one  can  circumvent  these  identities  to  show 

£. 

that,  in  fact,  cump(ii)  =  0{n~‘i).  This  is  done  by  deriving  cumulant  behaviour  from  the  large 
deviation  results  of  Barndorff-Nielsen  and  Wood  (1995),  Jensen  (1992,  1995,  1997)  and  Skovgaard 
(1990,  1996). 
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The  purpose  of  this  paper  is  to  show  that  one  can  take  this  further  -  that  these  large  deviation 
techniques  not  only  help  for  orders  of  convergence,  they  also  help  the  computation  of  coelRcients. 
Specifically,  it  will  normally  be  the  case  that 

c\xmp{R)  =  b2,p  n~^kp -\- 0{n~^)  (1-1) 

(see  Wallace  (1958),  Bhattacharya  and  Ghosh  (1978),  Hall  (1992),  and  Mykland  (1996)),  and  we 
shall  show  how  to  find  kp.  Similar  methods  seem  capable  of  yielding  higher  order  terms  in  the 
expansions  of  the  form  (1.1). 

In  the  following,  we  first  discuss  how  Edgeworth  and  large  deviation  expansions  hang  to¬ 
gether,  and  we  state  a  result  which  gives  the  form  of  the  generating  function  of  the  kp's.  kz  and 
k4  are  given  explicitly  (fcj  and  kz  are  previously  known).  A  more  rigorous  development  involving 
curved  exponential  families  is  given  in  Section  4.  Meanwhile,  in  Section  3,  we  show  how  these 
results  can  be  used  to  analyze  the  effect  of  the  alternative  on  the  null  distribution  of  R,  and  how 
this  affects  the  difference  between  nominal  and  actual  coverage  of  confidence  intervals. 

It  should  be  emphasized  that  we  are  not  completely  doing  without  the  Bartlett  identities.  The 
coefficients  come  up  again  in  the  formulae  we  derive,  see,  e.g.,  (2.9),  and  we  invoke  the  identities 
themselves  in  Section  3.  The  Bartlett  identities  remain  a  powerful  presence  in  likelihood  theory, 
even  if  one  can  sometimes  do  without  them. 

2.  The  Main  Formula 

If  an  asymptotically  normal  statistic  has  density  /„  and  cumulant  generating  function 
the  saddlepoint  approximation  has  the  form 

/„(nh)  =  - - - rexp(Ar„(f„)-f„A:;(f„))(l-bo(l)),  (2.1) 

(27rA"(f„))2 

where  Kn{fn)  =  nsfi.  This  goes  back  to  Daniels  (1952),  see  also  Theorem  1  of  Chaganty  and 
Sethuraman  (1985)  and  Theorem  1  of  Mykland  (1996).  If  we  are  dealing  with  the  signed  square 
root  statistic  ii„,  whose  cumulants  are  of  the  form  (1.1),  it  is  easy  to  Taylor  expand  (2.1)  to  get 
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the  following.  First  of  all, 

^{h)  =  lim„ 

exists  -  (t>  being  the  standard  normal  density.  Also, 


fnin^h) 

OO  1 


({h)  =  exp  +  •  • -j  • 


Following  the  development  in  Section  4,  we  get  the  following  formula  for 
Theorem  1.  Let 

7(/3)  =  lim(4(/3)-4(/?o))/n 

and 

J(j3)  =  -lim4(/3)/n 

where  the  limits  are  in  probability  under  Pp.  Set 


h(/3)  =  \/2  sign(/?  - 


Then  ^  (under  PpJ  is  given  by 


h 


dh' 


(2.2) 

(2.3) 

(2.4) 

(2.5) 

(2.6) 

(2.7) 


In  Appendix  1  we  show  that 


where  the  second  sum  is  over  all  gi,  ^2?  •  •  •  so  that  +  2q2  +  . . .  +  kqk  +  . . .  =  g,  and  where  the  6s 
are  the  coefficients  in  the  Bartlett  identities,  i.e., 


b{qi,...,qk)  = 


(see,  e.g.,  p.  159  of  Barndorff-Nielsen  and  Cox  (1989)).  Similarly, 


(2.9) 


J(/3)  «  --  ^(/3  -  fSoY  b{qi,  ...,qk)cnm(£^'^\  . .  . . .), 

71  '  7)\  ^  V/  ■!  11^  ^  v  . 

p>2  ^ 


(2.10) 


qi  times 


gjt  times 
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where 

~  1  /'r\ 

•  •  -iQv)  —  •  •■iQv)  ^p+2^  ^ 

Finding  the  expression  for  the  function  (2.7),  therefore,  is  purely  a  matter  of  inverting  the  function 
0,  and  then  plugging  it  into  and  also  differentiating  it.  This  is  easily  done  by  symbolic 

manipulation  software;  we  have  used  Maple  (Cher  et  al.  1991)  to  get  the  expressions  (2.12)  and 
(2.13)  below.  ki  and  k2  are  all  weU-documented  in  the  literature,  see,  e.g.,  McCuUagh  (1987),  p. 
214.  Here,  we  therefore  give 


-9/2  r  17  7 

kz  =  Cji  -C111C11C22  +  — C111C11C112  +  -  ciiicnciiii 

125  3  I  2  1  2  ^2 

-  Clll  +  <^11^23  -  2  C11C113  -  2  ^11^1112 

3 


a  2 

'To 


(2.12) 


and 


k^  -  C-® 
^4  —  ^11 


45  2  23  2  9  2 

11  1^—  C11C112C1111  -  —  C111C11C23  -  g 

45  2  1465  4  »  2  2 

-  CUC22C112  +  —  Cm  - 

45  o  9  33  99  .^2 

+  (^11^112  +  “  6  Cj] 

11 


9  2 

-  C11C22C1111 
o 

4  ^22 

2  ‘^11^22 


C11C24  “  8  cfiCii4 

11  51 

-g-  cfiC33  -  12  CiiCiii3 - —  CjjCii22 

21  3  13  3  33 

“  Y  C11C11112  -  Y  C11C222  -  Y  CiiCiiini 

113  9  455  2  341  2 

+  Y"  ‘^111^11^^22  -  yY  CmCiiCii2  -  —  CmCiiCnii 
+  Y  ^111^^11^113  +  3  C1HC11C122  +  16  +  C111C11C1112 


(2.13) 


where  c,j.„,^  ~  cum(£(®i), . .  .,£(®’'^)/n,  and  where  we  have  adopted  the  convention  from  McCuUagh 
(1987)  of  using  a  parametrization  where  ci,  =  0  for  g  >  2. 
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Note  that  in  i.i.d.  problems,  the  form  of  i  and  J  are  particularly  straightforward: 

I{I5)  =  f;(£i(/3)  -  £i(/3o))exp(4(^)  -  ^i(/3o))  (2-14) 

and 

J(/3)  =  -f^£i(/9)exp(4(/3) -^i(/?o))-  (2-15) 

Finally,  observe  that  the  above  viewpoint  gives  a  new  formula  for  an  R*  statistic, 

=  (2.16) 

Various  forms  of  U  have  been  investigated,  see,  in  particular  Jensen  (1992,  1997)  and  Skovgaard 
(1996).  In  view  of  the  development  in  Jensen’s  papers,  it  is  clear  that  (in  curved  exponential  and 
analytic  families),  one  can  take 

U  =  R/aRlV^y  (2-1”) 

This  has  the  right  unconditional  large  deviation  coverage  up  to  0(n“^),  though,  obviously,  the 
conditional  convergence  properties  are  probably  lost.  This  R*  is  a  function  of  i2,  and  it  can  be 
seen  as  a  large  deviation  version  of  Cornish-Fisher  inversion. 

3.  The  Accuracy  of  Confidence  Intervals 

One  of  the  least  studied  phenomena  of  likelihood  theory  is  the  impact  of  the  alternative 
on  the  coverage  accuracy  of  confidence  intervals.  At  first,  this  may  seem  like  a  contradiction  in 
terms  -  coverage  only  concerns  the  behaviour  of  a  statistic  under  the  nuU  hypothesis.  The  family 
of  alternatives,  however,  sets  up  the  likelihood  function  from  which  R  is  derived.  Hence,  different 
alternatives  lead  to  different  Rs,  and  hence  to  different  behaviour  under  the  nuU  distribution. 

From  a  traditional  likelihood  perspective,  this  may  seem  like  a  strange  consideration,  as  the 
likelihood  is  determined  by  the  actual  family  of  alternatives.  Recent  years,  however,  have  seen  the 
increasing  use  of  likelihoods  that  are  designed  to  work  under  a  multiplicity  of  null  distributions, 
and  such  likelihoods  need  a  pragmatic  and  sometimes  deliberately  wrong  specification  of  the  family 
of  alternatives.  Examples  of  this  include  the  partial  (Cox  (1972,  1975),  Wong  (1986)),  projective 
(McLeish  and  Small  (1992)  and  dual  (Mykland  (1995a),  Kong  and  Cox  (1996))  likelihoods. 
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Since  there  are,  therefore,  several  likelihoods  that  can  go  with  the  same  alternative,  it  raises 
the  question  of  how  to  compare  them.  The  debate  has  been  particularly  acue  in  connection  with 
empirical/dual  likelihood,  see  Corcoran,  Davison  and  Spady  (1995)  and  Section  6  of  Mykland 
(1996). 

The  debate  has  mainly  been  one  the  accuracy  of  possible  procedures.  This  is  because  the 
dual  and  true  likelihoods  have  the  same  power  to  first  order  in  contiguous  neighborhoods  (Mykland 
(1995a),  Section  5),  and  hence,  typically,  also  to  second  order  (Bickel,  Chibisov,  and  van  Zwet 
(1981)).  And  also  because  to  third  order,  though  the  two  do  not  have  the  same  power,  sometimes 
one  does  better,  sometimes  the  other  (Lazar  and  Mykland  (1997)). 

The  formulae  in  the  previous  section  permit  us  to  characterize  the  impact  of  the  alternative 
on  accuracy.  Consider  the  following  setup:  we  are  looking  at  log  likelihoods  I  having  the  same 
score  i,  but  where  we  can  otherwise  vary  £,  and  so  on,  as  we  see  fit.  t  being  a  log  likelihood 
implies  that  var(^)  +  E(^)  =  0,  so  first  and  second  order  efficiency  only  depends  on  1.  The  only 
restriction  we  impose  is  that  c.ov{i,l)  =  cov(i,^)  =  •  •  •  =  0  (as  in  McCuUagh  (1987),  Chapter  7), 
since  this  can  be  done  by  a  reparametrization  which  does  not  alter  the  statistic  R. 

The  expansion  for  the  density  /„  of  the  signed  square  root  R  is 


fnir)  =  4>{r)!^l  +  in  hi  +  n  k[)hi{r)  +  ^  n  ^k2h2ir)  +  ^  n  ^kshsir)  +  +  l)n  ^)J  , 

(3.1) 

where  hg{r)  is  the  ^’th  Hermite  polynomial.  (3.1)  doubles  as  an  Edgeworth  and  a  large  deviation 
expansion  (cf.  Chaganty  and  Sethuraman  (1985)).  In  the  same  notation  as  (2.12)-(2.13),  we  have 


that 


ki  =  -Cn^%ii/3!,  (3.2) 


T  1  1  ■ 

_3 

'17  9 

^C23  +  ^ciii2  + 

-  ^11  Cm 

and 


k2 


_2  r  1  1  1 

Cii  ^C22  -  -Cii2  - 


+ 


7_ 

18 


-3„2 


^11  ^111) 


(3.4) 


where  k^  is  the  corresponding  quantity  for  an  exponential  family  with  the  same  score.  ki  and  k2 
comes  from  p.  214  of  McCuUagh  (1987);  fcj  is  derived  in  Appendix  2.  Note,  incidentally,  that  the 
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kq  can  depend  on  n  to  the  extent  that  the  c’s  do.  One  could  also,  obviously,  expand  the  c’s  in 
orders  of  n,  but  that  would  only  deepen  the  messiness  of  expressions. 

It  is  clear  from  this  that  the  convergence  error  at  the  n~^  level  is  fixed  by  the  score  i,  the 

"3  *  " 

n~^  behaviour  depends  on  the  score  and  the  n"?  behaviour  on  f,  i  and  and  so  on.  In  itself, 
not  particularly  surprising. 

What  is  surprising,  however,  is  that  there  is  a  radical  difference  between  what  can  go  wrong 
at  the  n~^  level  and  the  level.  We  shall  argue  below  that  a  bad  choice  of  i  can  result  in 
arbitrary  overcoverage,  but  limited  under  coverage.  On  the  other  hand,  a  bad  choice  of  t  can  also 
lead  to  unlimited  undercoverage.  The  latter  is,  obviously,  particularly  dangerous. 

The  thing  is,  that  k'l  is  quadratic  in  with  positive  sign  in  front  of  the  square  term.  Set 
Ilf  =  [i,i]  +ai  —  2var(i),  where  [1,1]  is  the  observed  (optional)  quadratic  variation  of  I,  and  where 
a  =  -cov(£,  [i,i])/var(i).  Suppose  that  i  =  £if  +  m  +  R,  where  m  is  a  martingale  orthogonal  to 
i,  and  R  is  Op(l)  and  asymptotically  independent  of  i,  [i,i]  and  m.  This  will  be  the  case  in  most 
regular  situations;  the  independent  case  is  obvious;  for  Markov  chains,  see  p.  448  of  Jacod  and 
Shiryaev  (1987);  for  mixing  sums,  see  Ch.  5  of  Hall  and  Heyde  (1980),  or  also  Jacod  and  Shiryaev 
(1987).  By  the  Bartlett  identities  for  martingales  (Mykland  (1994), 

cov{£ify  m)  =  cum(£,  £,  m),  (3.5) 

and  hence 

h  =  k2,if  +  ^c7i^ivar(m)  +  o(l),  (3.6) 

where  k^ff  is  the  value  of  ^2  when  f//  is  taken  as  the  second  derivative  of  £.  Thus, 

k2>k2,if  +  o{l),  (3.7) 

establishing  our  claim  about  limited  undercoverage  at  this  level. 

The  coefficients  in  the  term,  however,  teU  a  different  story.  In  both  kz  and  k[ ,  £  enters 
linearly.  If  we  focus  on  kz,  let  £  and  £  be  given,  and  consider  a  zero  mean  martingale  m,  orthogonal 
to  so  that 

cov(f,  m)  —  ^cum(i,i,  m)  =  i/n  +  o{n), 


(3.8) 


8 


where  7^  0.  Replace  the  original  £  by  ^  +  otm.  The  new  satisfies  the  third  Bartlett 

identity  (and  is  hence  a  valid  third  derivative  of  ^),  and  also  cov(^of9^)  ~  0*  ^his  setup, 

*3, a  =  ^:3  + 

which  can  take  on  any  value.  In  other  words,  both  under-  and  overcoverage  is  potentially  un¬ 
bounded  at  this  level. 


4.  Curved  Exponential  Families 
For  a  more  rigorous  development,  consider  a  curved  exponential  family 

4(^)  =  4(/?o)  -h  (/?  -  /3o)4(/9o)  +  i(/3  -  +  •  •  •  (4.1) 

of  order  p  (i.6.,  terms  of  order  p  -1- 1  and  higher  are  nonrandom).  We  shall  consider  R  for  testing 
Hq:  13  =  /3o.  Suppose  that  there  is  a  valid  saddlepoint  approximation  to  the  density  of  the  vector 
(4(^), . One  can  then  proceed  as  follows. 

Begin  by  fixing  Pi  ^  Pq.  Then  reparametrize  the  family  as  in  Section  7.2.3  (p.  204-207)  of 
McCuUagh  (1987)  to  make  cov^i(i(/?i),£(?)(/3i))  =  0  for  2  <  <?  <  p.  It  is  clear  from  McCuUagh 
that  this  is  accomplished  by  using  parameter  9,  given  by  <j>i  =  P,  and 


_  E0APiMP)-W) 

var^j(i(,/3i)) 


(4.2) 


Hence 


<Po  —  4>i  =  4>{Pq)  -  4>\ 

E0,m) 

A^i) 


(4.3) 
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as  n  00  under  Pp.  Here  g(x)  =  (x  -  l)e*,  which  can  be  replaced  by  g{x)  =  xe^  since 
E/Sg  exp(£(/3i)  -  ^(/?o))  =  1-  In  the  new  parametrization,  the  null  hypothesis  is  4>  =  4>o. 

Now  embed  Inifi)  -  4(/?o)  in  a  full  exponential  family.  In  the  notation  of  Jensen  (1997) 
(which  we  shaU  be  using  in  the  Mowing),  T,  =  Note  that  we  do  not  require  the  T^’s  to 

be  means,  only  that  the  saddlepoint  approximation  hold. 

Our  larger  family  is  then  (in  the  new  parametrization) 

C(<^o)  +  0\ln{4>o)  +  ■  •  •  + 

A  reparametrization  of  the  ^’s  is  given  by 


and 


ei  =  <i> 


Of  =  (j)bf  for  £>2 


(4.4) 


(in  Jensen’s  notation,  (f>  is  (3o  and  bi  is  /3^).  A  corresponding  sequence  of  null  hypotheses  is 

4>  =  (j>o 

and 

:  bf  =  I  for  f  >  2.  (4.5) 


Hence,  is  our  original  null  hypothesis. 

Let  the  ups  be  chosen  as  in  Section  3  of  Jensen  (1997).  In  view  of  Section  2  of  the  same 
paper,  the  joint  density  of  (i^i,  El, 2,  •  •  • ,  Rl,p)  is,  in  a  large  deviation  region, 


1  hi 
(27r)p/2  wi 


ii.iy 


H-0(n-')}. 


(4.6) 


Note  that  Ri  =  R.  By  using  Skorokhod  embedding,  it  therefore  follows  that,  under 


fl3o{R  I  Rl,2,---,Rl,p)  ^  ^  {l  +  Op(n-i)} 
4>{R)  Ui 


(4.7) 
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Clearly,  =  h{l3i)il  +  Op{n-^/^)),  where  h  is  given  in  (2.6).  Hence,  if  we  can  show  that 

Fx/0^  =  (<^1  -  {l  +  Op{n-^'‘^)]  ,  (4.8) 

it  follows  from  (4.3),  and  by  averaging  over  (El, 2,  •  •  •  >  Rl,p)-,  that 

W  7(/3i)  ^ 

=  {l  +  Op(n-'/^)}  (4.9) 

under  .  Again  by  Skorokhod  embedding,  we  get 
Theorem  2.  Under  the  above  assumptions, 

MMr)  =  {l  +  0(n-i/2)}  (4.10) 

in  a  large  deviation  region  \h\  <  c,  with  h  =  Vy/n.  ■ 


Theorem  1  is  an  immediate  corollary. 

It  remains  to  show  (4.8).  In  Jensen’s  (1997)  notation, 


9^-9^-^  =  _ /3f-^ , 9i  - eU  -  0i)  -  Y 

(4.11) 

where  di  —  (/3i~^)^  is  the  term  in  the  £’th  column.  Note  that  {(i^Y  ^  raised  to  power  k,  which 

is  the  only  instance  of  power  notation  in  (4.11). 

Since  corr(Ti,T^f)  ~  0,  (note  that  this  is  where  the  reparametrization  above  is  used),  - 
is  Op(n-i/2)  not  Op(l).  On  the  other  hand,  for  £>2,^1-  =  Op(u"^).  Hence,  for 


£>2, 


^  -  ^-1  =  (0,  0i-^  )^  0, . . . ,  0)  +  Op{i 


(4.12) 


Hence  the  determinants  in  equation  (7)  in  Jensen  (1997)  can  be  evaluated  by  multiplying  the 
diagonal,  and  so  (5.8)  follows.  Note  that  in  the  above  argument,  if  Te  is  zero,  one  just  deletes  line 
and  column  £  and  makes  the  appropriate  modification  to  the  next  column.  This  does  not  affect 
the  result. 
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Appendix  1 

Obviously,  if  g(x)  =  xe^, 

Tt 

=^-Ep,gim-£iPo)),  (Al.l) 

n 

so  that  the  p’th  derivative  is 

~  -Eh,  V  5^")(0)c(gi, . . . ,  qvWoT  ' '  (A1.2) 

91 +2^2  H - Vvqv^p 

which  yields  (2.5)  since  g(^^(0)  =  v  and  since  moments  can  be  replaced  by  cumulants  in  the  above. 
The  latter  can  either  be  seen  by  direct  computation,  or  by  observing  that  the  right  hand  side  of 
(Al.l)  is  0(1). 

To  find  J,  note  that 

=  E  . «>.)(<)’■•■■(«'”>)’'  (ai.3) 

^  r=0  91+292H - \-vqv=^p-r 

which  gives  (2.7)  for  the  same  reasons  as  used  above,  and  because 

^(^1 9  •  •  •  ?  Q.v)  ~  ^  ^  ^7*^  ^(^1 7  •  •  *  ?  ^r-|-l7  ^r4'2  f  ?  i  ^i;)?  ( Al.4) 


which  gives  (2.12)  by  direct  computation,  using  (2.6). 
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Appendix  2 

This  is  the  calculation  of  (3.3).  The  expectation  is  calculated  by  the  Taylor  expansion 
method  as  we  need  an  additional  term  in  E{Rn)  to  that  provided  by  Theorem  1.  In  principle,  the 
technology  used  to  derive  Theorem  1  can  also  be  used  to  get  higher  order  terms  (like  but  since 
we  are  just  looking  for  an  expectation,  this  is  a  little  like  shooting  sparrows  with  cannons.  If  one 
were  looking  for  higher  order  terms  in  higher  oder  cumulants,  an  extension  of  Theorem  1  would 
greatly  ease  one’s  life. 

The  derivation  is  here  done  in  the  multivariate  setting.  We  shall  use  the  notation  in  McCul- 
lagh  (1987).  In  the  univariate  case  R  =  By  adding  the  term  to  the  stochastic 

expansion  (7.15)  on  p.  214  in  McCuUagh  (1987),  one  gets  that 

Wr  =  W'r-\-n-^l‘^ZrsZ‘l2 

+  n-^{ZrstZ‘Z^IV.  +  {Vrstu  +  Vr,,,t,u)Z^  Z^ Z'^ / A\ 

+  3Z,,Z**Zt/8  +  bZrsZtZuv’^^'^112} 

+  ll{Urstu  +  Ur,s,i,uy'^l'v.xZ^Z'^Z^Z^IlAA  +  77} 

+  Op(n“^)  (A2.1) 

where  Wr  is  the  corresponding  quantity  for  an  exponential  family  alternative  with  the  same  score 
as  the  original  alternative,  and  where  77  is  a  generic  term  referring  to  a  linear  combination  of 
products  of  4  Z’s,  at  least  one  and  at  most  three  of  which  is  a  higher  order  derivative.  It  follows 
that 

EWr  =  EWr 

+  +  (I'ritu  + 

+  ^{Kstuv  +  15'.  +  ll{Vrstu  + 

+  0(n-®/7  (A2.2) 


Simplifying  to  univariate  notation  gives  (3.3). 
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